Search CORE

1,914 research outputs found

An overview of textual semantic similarity measures based on web intelligence

Author: Martinez-Gil Jorge
Publication venue
Publication date: 30/06/2012
Field of study

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is a key challenge in many computer related fields. The problem is that traditional approaches to semantic similarity measurement are not suitable for all situations, for example, many of them often fail to deal with terms not covered by synonym dictionaries or are not able to cope with acronyms, abbreviations, buzzwords, brand names, proper nouns, and so on. In this paper, we present and evaluate a collection of emerging techniques developed to avoid this problem. These techniques use some kinds of web intelligence to determine the degree of similarity between text expressions. These techniques implement a variety of paradigms including the study of co-occurrence, text snippet comparison, frequent pattern finding, or search log analysis. The goal is to substitute the traditional techniques where necessary

ZENODO

A Survey on Legal Question Answering Systems

Author: Martinez-Gil Jorge
Publication venue
Publication date: 12/10/2021
Field of study

Many legal professionals think that the explosion of information about local, regional, national, and international legislation makes their practice more costly, time-consuming, and even error-prone. The two main reasons for this are that most legislation is usually unstructured, and the tremendous amount and pace with which laws are released causes information overload in their daily tasks. In the case of the legal domain, the research community agrees that a system allowing to generate automatic responses to legal questions could substantially impact many practical implications in daily activities. The degree of usefulness is such that even a semi-automatic solution could significantly help to reduce the workload to be faced. This is mainly because a Question Answering system could be able to automatically process a massive amount of legal resources to answer a question or doubt in seconds, which means that it could save resources in the form of effort, money, and time to many professionals in the legal sector. In this work, we quantitatively and qualitatively survey the solutions that currently exist to meet this challenge.Comment: 57 pages, 1 figure, 10 table

arXiv.org e-Print Archive

Automatic Design of Semantic Similarity Ensembles Using Grammatical Evolution

Author: Martinez-Gil Jorge
Publication venue
Publication date: 01/08/2023
Field of study

Semantic similarity measures are widely used in natural language processing to catalyze various computer-related tasks. However, no single semantic similarity measure is the most appropriate for all tasks, and researchers often use ensemble strategies to ensure performance. This research work proposes a method for automatically designing semantic similarity ensembles. In fact, our proposed method uses grammatical evolution, for the first time, to automatically select and aggregate measures from a pool of candidates to create an ensemble that maximizes correlation to human judgment. The method is evaluated on several benchmark datasets and compared to state-of-the-art ensembles, showing that it can significantly improve similarity assessment accuracy and outperform existing methods in some cases. As a result, our research demonstrates the potential of using grammatical evolution to automatically compare text and prove the benefits of using ensembles for semantic similarity tasks. The source code that illustrates our approach can be downloaded from https://github.com/jorge-martinez-gil/sesige.Comment: 29 page

arXiv.org e-Print Archive

Framework to Automatically Determine the Quality of Open Data Catalogs

Author: Martinez-Gil Jorge
Publication venue
Publication date: 28/07/2023
Field of study

Data catalogs play a crucial role in modern data-driven organizations by facilitating the discovery, understanding, and utilization of diverse data assets. However, ensuring their quality and reliability is complex, especially in open and large-scale data environments. This paper proposes a framework to automatically determine the quality of open data catalogs, addressing the need for efficient and reliable quality assessment mechanisms. Our framework can analyze various core quality dimensions, such as accuracy, completeness, consistency, scalability, and timeliness, offer several alternatives for the assessment of compatibility and similarity across such catalogs as well as the implementation of a set of non-core quality dimensions such as provenance, readability, and licensing. The goal is to empower data-driven organizations to make informed decisions based on trustworthy and well-curated data assets. The source code that illustrates our approach can be downloaded from https://www.github.com/jorge-martinez-gil/dataq/.Comment: 25 page

arXiv.org e-Print Archive

A Novel Approach for Learning How to Automatically Match Job Offers and Candidate Profiles

Author: Martinez-Gil Jorge
Paoletti Alejandra Lorena
Pichler Mario
Publication venue
Publication date: 07/09/2017
Field of study

Automatic matching of job offers and job candidates is a major problem for a number of organizations and job applicants that if it were successfully addressed could have a positive impact in many countries around the world. In this context, it is widely accepted that semi-automatic matching algorithms between job and candidate profiles would provide a vital technology for making the recruitment processes faster, more accurate and transparent. In this work, we present our research towards achieving a realistic matching approach for satisfactorily addressing this challenge. This novel approach relies on a matching learning solution aiming to learn from past solved cases in order to accurately predict the results in new situations. An empirical study shows us that our approach is able to beat solutions with no learning capabilities by a wide margin.Comment: 15 pages, 6 figure

arXiv.org e-Print Archive

Annotated Bibliography on Ontology Matching

Author: Martinez-Gil Jorge
Publication venue
Publication date
Field of study

Annotated Bibliography on Ontology Matchin

ZENODO

FigShare

Annotated Bibliography on Knowledge Bases

Author: Martinez-Gil Jorge
Publication venue
Publication date
Field of study

Annotated Bibliography on Knowledge Base

ZENODO

Fuzzy Logics for Multiple Choice Question Answering

Author: Martinez-Gil Jorge
Publication venue
Publication date: 27/01/2022
Field of study

We have recently witnessed how solutions based on neural-inspired architectures are the most popular in terms of Multiple-Choice Question Answering. However, solutions of this kind are difficult to interpret, require many resources for training, and present obstacles to transferring learning. In this work, we move away from this mainstream to explore new methods based on fuzzy logic that can cope with these problems. The results that can be obtained are in line with those of the neural cutting solutions, but with advantages such as their ease of interpretation, the low cost concerning the resources needed for training as well as the possibility of transferring the knowledge acquired in a much more straightforward and more intuitive way

E-LIS

AI-Based Recruiting: The Future Ahead

Author: Martinez-Gil Jorge
Publication venue
Publication date: 28/01/2021
Field of study

The Human Resources industry is currently being revolutionized by the automation of tedious and time-consuming aspects of their processes. Since AI paradigms such as deep neural networks and other machine learning methods can make accurate predictions and analyze vast amounts of information, these technologies are suitable for facing some of the major challenges in this domain. We overview here how this industry is changing; from the automatic screening of the candidates to bias removal in most of the processes, through techniques for the automatic discovery of potential employees or new advances for improving the candidate's experience

E-LIS

NEFUSI: NeuroFuzzy Similarity. Final Report

Author: Martinez-Gil Jorge
Publication venue
Publication date: 20/12/2022
Field of study

This research work presents the final report for the NEFUSI project. In fact, we present here our research findings on building neurofuzzy models that automatically evaluate semantic textual similarity in an accurate and timely manner. We show that neural networks and fuzzy logic have different features that make them suitable for certain problems but unsuitable for others. Neural networks, on the one hand, are valuable tools for identifying patterns. However, they need to make it easier for people to comply with the decisions. On the other hand, interpretation is possible within fuzzy logic systems, but they cannot automatically derive the rules they use to make those decisions. These constraints served as the primary reason for developing a novel intelligent hybrid system, which combines two approaches to circumvent the individual effects of both limitations simultaneously

E-LIS